SAS Viya Search Demo

Steps:

  1. Import modules
  2. Start CAS Connection
  3. Upload CSV to CAS Table
  4. Create a search index
  5. Add data to index
  6. Search index (term search and GEO filtering)
  7. Display Geo aware query results on google map
  8. Run facet aggregation
  9. Run date histogram aggregation and draw timeline

Import modules and set global variables


In [1]:
import pandas as pd 
import json
import swat
import plotly.plotly as py
import plotly.graph_objs as go
import folium

from ipywidgets import * 
from IPython.display import display 
from IPython.display import clear_output
from IPython.html import widgets
from plotly.offline import init_notebook_mode, iplot
from pprint import pprint
from folium.plugins import MarkerCluster
from folium import CircleMarker

init_notebook_mode()

index_name = 'SFPD_INCIDENTS_2015_INDEX'


/sas3rd/dev/mva-v9cas/lax/python/anaconda3/lib/python3.4/site-packages/IPython/html.py:14: ShimWarning:

The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.

Connect to CAS Server


In [2]:
conn = swat.CAS('cas01', 19640)

Upload local CSV data file to CAS


In [3]:
dataTable = conn.upload('data/SFPD_Incidents_2015.csv')


NOTE: Cloud Analytic Services made the uploaded file available as table SFPD_INCIDENTS_2015 in caslib CASUSER(kesmit).
NOTE: The table SFPD_INCIDENTS_2015 has been created in caslib CASUSER(kesmit) from binary data uploaded to Cloud Analytic Services.

Create a search index


In [4]:
if conn.tableexists(name=index_name):
    conn.droptable(name=index_name)

conn.search.buildindex(index={'name':index_name, 'promote':True},
                    schema='[{"field":"Category","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"Descript","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"DayOfWeek","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"PdDistrict","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"Date","isKey":false,"fieldType":"datetime","isMultiValue":false},\
                    {"field":"Address","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"Resolution","isKey":false,"isIndex":true,"fieldType":"string","isMultiValue":false},\
                    {"field":"GeoPoint","isKey":false,"fieldType":"geo","isMultiValue":false},\
                    {"field":"X","isKey":false,"fieldType":"float","isMultiValue":false},\
                    {"field":"Y","isKey":false,"fieldType":"float","isMultiValue":false}]')


NOTE: Cloud Analytic Services dropped table SFPD_INCIDENTS_2015_INDEX from caslib CASUSER(kesmit).
WARNING:  The key field has not been specified; therefore, the document's MD5 HASH will be used as the key.
Out[4]:
§ errorCode
0

§ segments
{'failed': 0, 'total': 1, 'successful': 1}

elapsed 0.0783s · user 0.049s · sys 0.027s · mem 32.2MB

Append data to index from CAS table

Add documents to newly created index from data table.


In [5]:
conn.search.appendindex(index={'name':index_name},
                        table={'name':'SFPD_INCIDENTS_2015'})


WARNING:  The following field does not appear in the schema of the CAS table and so it will not be added to the index: IncidntNum
Out[5]:
§ errorCode
0

§ rows
{'validRows': 156206, 'invalidRows': 0}

§ segments
{'failed': 0, 'total': 0, 'successful': 16}

elapsed 10.3s · user 58.5s · sys 7.54s · mem 0.0872MB

MatchAll query

This step produces 10 matches and a CAS table (named 'allResults') that contains all results. It can be followed by "Display results in map", "Run facet aggregation" and "Draw timeline" steps, to further view and analyze results.


In [6]:
conn.search.searchindex(casout={'name':'allResults','replace':True},
                        index={'name':index_name},
                        json='{"query":{"matchAll":true}}',
                        jsonout=False
                       )


NOTE:  No output object is detected in the input string; therefore, all fields were returned.
Out[6]:
§ searchResults
searchResults
id relevance PdDistrict GeoPoint DayOfWeek X Y Address Resolution Descript Category Date
0 000033cfbcabb8324edbdc11771b0577 1.0 NORTHERN {"type":"point","coordinates":[-122.4278224,37... Monday -122.427822 37.775787 BUCHANAN ST / LINDEN ST NONE PETTY THEFT FROM UNLOCKED AUTO LARCENY/THEFT 1.751850e+09
1 00018b71266e10d82d38a4b4ea5d36e8 1.0 CENTRAL {"type":"point","coordinates":[-122.4110689,37... Sunday -122.411069 37.788856 600 Block of SUTTER ST NONE BATTERY ASSAULT 1.752311e+09
2 0001c5545da75836af10460f5fc0388d 1.0 RICHMOND {"type":"point","coordinates":[-122.4396657,37... Thursday -122.439666 37.784807 1600 Block of DIVISADERO ST NONE LOST PROPERTY NON-CRIMINAL 1.757559e+09
3 000350ae494774ab214bd1e31dfb024e 1.0 RICHMOND {"type":"point","coordinates":[-122.4588606,37... Tuesday -122.458861 37.781255 ARGUELLO BL / GEARY BL NONE GRAND THEFT PICKPOCKET LARCENY/THEFT 1.764613e+09
4 000392e2f75214b5c26eea9184d5a951 1.0 TENDERLOIN {"type":"point","coordinates":[-122.4093131,37... Sunday -122.409313 37.784348 MASON ST / EDDY ST ARREST, BOOKED TRAFFIC VIOLATION ARREST OTHER OFFENSES 1.742089e+09
5 0003bf0366e740a240bc846a6bb6ebfa 1.0 TARAVAL {"type":"point","coordinates":[-122.5017653,37... Sunday -122.501765 37.763403 1200 Block of 42ND AV NONE MISCELLANEOUS INVESTIGATION OTHER OFFENSES 1.759018e+09
6 000430d3a7cedb52bafad7ebdb0ef621 1.0 CENTRAL {"type":"point","coordinates":[-122.4197994,37... Saturday -122.419799 37.802925 CHESTNUT ST / HYDE ST NONE GRAND THEFT FROM LOCKED AUTO LARCENY/THEFT 1.754690e+09
7 00045238febb177a2fc518b7674e8019 1.0 MISSION {"type":"point","coordinates":[-122.4181243,37... Thursday -122.418124 37.761090 300 Block of CAPP ST ARREST, BOOKED DOMESTIC VIOLENCE SECONDARY CODES 1.744852e+09
8 000596b9a1b1911ba1b769b403a26615 1.0 PARK {"type":"point","coordinates":[-122.4427322,37... Sunday -122.442732 37.756374 0 Block of GLENDALE ST NONE DOMESTIC VIOLENCE SECONDARY CODES 1.739689e+09
9 0005ef90041d8916780f6a3ad8c2685f 1.0 MISSION {"type":"point","coordinates":[-122.4143478,37... Saturday -122.414348 37.748134 3200 Block of CESAR CHAVEZ ST NONE GRAND THEFT FROM LOCKED AUTO LARCENY/THEFT 1.740195e+09

§ OutputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) allResults 155988 12 CASTable('allResults', caslib='CASUSER(kesmit)')

§ result
{"status":{"errorCode":0,"segments":{"total":8,"successful":16,"failed":0}},"hits":{"total":155988}}

elapsed 0.938s · user 6.29s · sys 1.32s · mem 32.5MB

Search the index (term search and geo filter)

This steps displays two input text fields. The first field is to input search term. You may also do a phrase search by enclosing the whole phrase in escaped double quotation marks. For example, \"a phrase\". The radius input field can be left blank: in this case all results matching the search term in the index (regardless geo locations) will be reported. When radius is given, a circle with that radius centered at (lat: 37.79, long: -122.415) is formed and only results falling within that circle will be reported. All matching records will be saved in a CAS table 'allResults' for later steps to consume.


In [8]:
text = widgets.Text(description="Search term", width=200) 
radius = widgets.Text(description="Radius (km)", width=200)

display(text)
display(radius)

def handle_submit(sender):
  if radius.value=='':
    results = conn.search.searchindex(
      index={'name':index_name},   
      json='{"query":{"simplequery":{"query":"' + text.value + '"}},"output":{"fields":["Descript",\
        "Category","PdDistrict","DayOfWeek","Resolution","Address","X","Y","Date"]}}',
      jsonout=False, 
      casout={'name':'allResults','replace':True})
    
  else:
    results = conn.search.searchindex(
      index={'name':index_name},json='{"query":{"andquery":[{"simplequery":{"query":"' \
        + text.value + '"}},{"geofilter":{"field":"GeoPoint","shape":{"coordinates":[-122.415,37.79],\
        "type":"circle","radius":'+radius.value+'}}}]}, "output":{"fields":["Descript","Category","PdDistrict",\
        "DayOfWeek","Resolution","Address","X","Y","Date"]}}',
      jsonout=False, 
      casout={'name':'allResults','replace':True})
    
  clear_output()
  display(results)

text.on_submit(handle_submit) 
radius.on_submit(handle_submit)


§ searchResults
searchResults
id relevance Descript Category PdDistrict DayOfWeek Resolution Address X Y Date
0 70acd5be5396c97f6732e50debae656f 0.5 GRAND THEFT FROM LOCKED AUTO LARCENY/THEFT MISSION Thursday NONE 24TH ST / CAPP ST -122.417360 37.752306 1.741853e+09
1 1d14d181a86d02b9e0daa1c68fca0d13 0.5 PETTY THEFT SHOPLIFTING LARCENY/THEFT NORTHERN Saturday ARREST, BOOKED 1300 Block of WEBSTER ST -122.431046 37.783030 1.741976e+09
2 cac73516ce1a12d2919af2f27238afa8 0.5 GRAND THEFT PICKPOCKET LARCENY/THEFT CENTRAL Friday NONE KEARNY ST / CLAY ST -122.404799 37.794453 1.741901e+09
3 541857b6ff50a930d39076d59e547531 0.5 PETTY THEFT FROM A BUILDING LARCENY/THEFT MISSION Friday NONE 1700 Block of 15TH ST -122.422668 37.766557 1.756444e+09
4 bd516f60d306aa22ee565719fc71d77e 0.5 GRAND THEFT FROM A BUILDING LARCENY/THEFT INGLESIDE Sunday ARREST, BOOKED 1100 Block of ALEMANY BL -122.430220 37.730733 1.755975e+09
5 2dce134c3a12f960dff0040cffd4fb8a 0.5 LICENSE PLATE OR TAB, THEFT OF LARCENY/THEFT MISSION Sunday NONE 800 Block of CAPP ST -122.417361 37.753093 1.741446e+09
6 6346caeab044714a2d202b7c5ee1c10d 0.5 LOST PROPERTY, PETTY THEFT LARCENY/THEFT TENDERLOIN Saturday NONE FULTON ST / HYDE ST -122.415128 37.779787 1.741984e+09
7 cee9d277df7a19235668249647c8c404 0.5 PETTY THEFT FROM A BUILDING LARCENY/THEFT TENDERLOIN Saturday NONE 100 Block of GOLDEN GATE AV -122.413070 37.782020 1.741977e+09
8 1f29381036c322d4e381c3271baa3d44 0.5 PETTY THEFT FROM LOCKED AUTO LARCENY/THEFT BAYVIEW Friday NONE 1600 Block of LANE ST -122.389832 37.731939 1.741934e+09
9 d7ac2a17399a514291b6f751d2bb0bc4 0.5 GRAND THEFT FROM LOCKED AUTO LARCENY/THEFT INGLESIDE Saturday NONE CONGO ST / MONTEREY BL -122.442022 37.731575 1.741955e+09

§ OutputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) allResults 41927 11 CASTable('allResults', caslib='CASUSER(kesmit)')

§ result
{"status":{"errorCode":0,"segments":{"total":8,"successful":16,"failed":0}},"hits":{"total":41927}}

elapsed 0.426s · user 2.02s · sys 0.301s · mem 8.05MB

Display search results in map

This step draws search results in 'allResults' CAS table on the google map. When Geo filter radius is given, it draws a circle on the map corresponding to the filtering shape. Clustered markers on the map can be clicked and the map will zoom in to show more details in that area.


In [9]:
coords = conn.table.fetch(fetchvars=['X', 'Y'],
                          table={'name':'allResults'}, 
                          maxrows=5000, 
                          to=1000)

xs=coords.Fetch['X']
ys=coords.Fetch['Y']

SF_COORDINATES = (37.76, -122.45)

map = folium.Map(location=SF_COORDINATES, zoom_start=12)

locations = [];
for i, val in enumerate(xs):
    locations.append([ys[i], val])
        
h = folium.FeatureGroup(name='Hydroelectric')
h.add_children(MarkerCluster(locations=locations))
map.add_children(h)
map.add_child(folium.LatLngPopup()) 

try:
    radius
except NameError:
    pass
else:
    if radius.value != '':
        map.add_children(CircleMarker(location=[37.79,-122.415], 
                                      radius=float(radius.value)*1000, 
                                      fill_opacity=0.05))

display(map)


Run facet aggregation

This step run a facet count for all fields in one shot. See the facet counts in the result table.


In [10]:
response = conn.search.valuecount(json='{"count":5,"fields":["Category","DayOfWeek","PdDistrict","Resolution"]}',
                                  jsonout=False,
                                  table={'name':'allResults'})
display(response)


§ valueCount_results
Field FmtVar Ranking Freq
0 Category LARCENY/THEFT 0 41927
1 DayOfWeek Friday 0 6596
2 DayOfWeek Saturday 1 6497
3 DayOfWeek Wednesday 2 5908
4 DayOfWeek Thursday 3 5873
5 DayOfWeek Tuesday 4 5792
6 PdDistrict SOUTHERN 0 10878
7 PdDistrict NORTHERN 1 7450
8 PdDistrict CENTRAL 2 6897
9 PdDistrict RICHMOND 3 2970
10 PdDistrict TARAVAL 4 2724
11 Resolution NONE 0 39799
12 Resolution ARREST, BOOKED 1 1964
13 Resolution JUVENILE BOOKED 2 86
14 Resolution UNFOUNDED 3 52
15 Resolution EXCEPTIONAL CLEARANCE 4 17

elapsed 0.39s · user 0.731s · sys 2.9s · mem 8.45MB

Draw timeline

This step runs a daily date histogram aggregation against the field 'Date' first. then, it plots the time line using the daily crime count in the aggregation results. The timeline can be zoomed in by highlighting a subsection of the drawing.


In [11]:
# Run datetime histogram aggregation

resp = dict(conn.search.searchaggregate(table={'name':'allResults'},
        json='{"name":"daily incidents","groupby":[{"histogram":{"unit":"day","min":"2015-01-01","max":"2015-12-31",\
        "interval":1},"field":"Date"}]}',
        jsonout=True))

result = json.loads(resp['result'])
aggResult = dict(result['aggregation'])
groups = aggResult['group'];
x1=[]
y1=[]
for val in groups:
    groupMap = dict(val)
    x1.append(groupMap['name'])
    y1.append(groupMap['count'])

trace1 = go.Scatter(
    x=x1,
    y=y1,
    fill='tozeroy'
)

iplot([trace1])


NOTE:  CASIDX aggregation finished data mapping and is ready for tkcasagg
NOTE:  CASIDX aggregation successfully finished on table ALLRESULTS

In [12]:
conn.close()

In [ ]: